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CLAIMS 

What is claimed is: 

1. A computer implemented method for center-based clustering a 
set, S, of n points to identify k centers through sampling of large data sets, 
wherein k is an integer value greater than one, the method comprising the steps 
of: 

determining at least one representational value of a diameter of a space 
Mthat comprises said set S of said n points; 

obtaining a sample R from said set S of said n points; 

determining at least one cluster for said sample R; and 

outputting centers, c h ..,c^ as identified by said cluster of said sample, R. 

2. The method as set forth in claim 1, further comprising the step of 
reducing the number of dimensions d to log n, if d is larger than log n, prior to 
determining said representational value of said diameter of said space M 

3. The method as set forth in claim 2, further comprising the steps 

of: 

executing a discrete clustering of sample if in a reduced space and 
translating said centers back to original space prior to outputting them. 

4. The method as set forth in claim 1, wherein the size of the 
sample R is greater than or equal to the resulting value of: 
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if if is in Euclidean space and 
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if R is in a metric space. 



5. The method as set forth in claim 1, wherein the step of 

determining said diameter M, if M is unknown, comprises the step of obtaining 

a sample of size greater than or equal to the resulting value of: 

2d, 2d 
— log—. 
s 8 



6. A computer implemented method for assessing a quality of 
conjunctive clusters ti,.., comprising the steps of: 

determining a length of each respective conjunction, t l ; 

determining a probability of each respective conjunction, t l ; 

summing the product of the length of the conjunction, t { , with the 

probability of t t for i ranging from 1 to k\ 

wherein, the k conjunctions, f cover all but y of the distribution; and 
wherein maximizing the summing said product step optimizes 
conjunctive clusters. 

7. The method set forth in claim 6, wherein the length of each 
respective said conjunction is determined by determining a number of variables 
in each respective said conjunction. 
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8. The method set forth in claim 7, wherein the step of determining 
a probability of each respective conjunction, t g9 comprises the step of 
determining a number of points that satisfy said conjunctions. 

9. A computer implemented method for disjoint conjunction 
clustering a set S of n points through sampling of large data sets, the method 
comprising the steps of: 

obtaining a sample, i?, from said set, S; 
generating a plurality of signatures of k disjoint conjunctions; 
enumerating over an each individual signature q of said plurality of 
signatures of k disjoint conjunctions, by: 

partitioning said R into buckets B lJ ,. ) B k according to said 
signature, q; 

inducing additional buckets as needed; 

determining a conjunction U for each bucket of points B l that 

represents the most specific conjunction that satisfies the points in B l ; 

computing an empirical frequency R( re- 
assessing a numeric quality representation from a summation of a 

product of the length of U and the empirical frequency R(tJ, over all said buckets 

induced by said signature s; and 

outputting k disjoint conjunctions of said sample R that exhibits a 

highest absolute value of said numeric quality representation. 
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10. The method as set forth in claim 9 wherein the size of the sample R 
is greater than or equal to: 

1 2 2d 2 k 2 2 

min{— (dfcln3 + ln— ), = — (c/ln3 + ln— )}, 

y 8 s 8 
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